Search CORE

17 research outputs found

The multilingual entity task (MET) overview

Author: Mary Ellen Okurowski
Nancy Chinchor
Roberta Merchant
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1996
Field of study

Conference-6 (MUC-6) evaluation of named entity identification demonstrated that systems are approach-ing human performance onEnglish language t xts [10]. Informal and anonymous, the MET provided a new opportunity to assess progress on the same task in Span-ish, Japanese, and Chinese. Preliminary results indicate that MET systems in all three languages performed comparably to those of the MUC-6 evaluatien in English. Based upon the Named Entity Task Guidelines [ 11], the task was to locate and tag with SGML named entity expressions (people, organizations, and locations), time expressions (time and date), and numeric expressions (percentage and money) in Spanish texts from Agence France Presse, in Japanese texts from Kyodo newswire, or in Chinese texts from Xinhua newswkel. Across lan-guages the keywords "press conference " retrieved a rich subcorpus of texts, covering awide spectrum of topics. Frequency and types of expressions vary in the three language sets [2] [8] [9]. The original task guidelines were modified so that he core guidelines were language independent with language specific rules appended. The schedule was quite abbreviated. In the fall, Government language teams retrieved training and test texts with multilingual software for the Fast Data Finder (FDF), refined the MUC-6 guidelines, and manually tagged 100 training texts using the SRA Named Entity Tool. In January, the training texts were released along with 200 sample unannotated training texts to the partic-ipating sites. A dry run was held in late March and early April and in late April the official test on 100 texts was. The language t xts were supplied by the Linguistic Data Consortium (LDC) at the University of Pennsylvania. performed anonymously. SAIC created language ver-sions of the scoring program and provided technical support throughout. Both commercial and academic groups partici-pated. Two groups, New Mexico State University/Com

CiteSeerX

Crossref

Wide-coverage deep statistical parsing using automatic dependency structure annotation

Author: Abney Stephen
Andy Way
Aoife Cahill
Briscoe Edward
Chinchor Nancy
Johnson Mark
Josef van Genabith
Michael Burke
Ruth O'Donovan
Stefan Riezler
Xue Nianwen
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2008
Field of study

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

Crossref

Irish Universities

DCU Online Research Access Service

Recommended from our members

Sign Language Research and Linguistic Universals

Author: Chinchor Nancy
Forman Joan
Grosjean Francois
Hajjar Michael
Kegl Judy
Lentz Ella Mae
Philip Marie
Sign Language Society New England
Wilbur Ronnie Bring
Publication venue: ScholarWorks@UMass Amherst
Publication date: 31/10/2020
Field of study

ScholarWorks@UMass Amherst